Author: Lianfa Li
This library provides the implementation of our method described in the paper “Enhanced SAR-to-Optical Image Translation: A Multimodal Knowledge-Guided Diffusion Model Approach”, which has been submitted to The Visual Computer.
Our approach leverages three modal conditions to guide the translation from Synthetic Aperture Radar (SAR) to optical images:
Our framework concatenates dual modalities (SAR + shared embedding space) and applies cross-attention between text descriptions and the translation encoder. This multimodal guidance significantly constrains the translation process, reduces randomness, and achieves state-of-the-art performance.
Framework of the proposed method
To validate our method, we conduct experiments on the WHU-OPT-SAR dataset (https://github.com/AmberHen/WHU-OPT-SAR-dataset), a comprehensive multi-modal remote sensing dataset that provides co-registered optical and SAR imagery for cross-modal analysis. The dataset covers approximately 51,448 km² in Hubei Province, China (30-33°N, 108-117°E) at 5-meter spatial resolution. It contains 100 image pairs, each measuring 5,556 × 3,704 pixels, providing substantial data for learning cross-modal correspondences.
dataset/ - Data Processing Pipelinedatasampling.py - Patch-based data samplingdefaulttextcode.py - Default text description: “A
remote sensing optical image”gdifdataset.py - Main data access interface with
DataLoaderpixels2text.py - Pixel and text encoding utilitiesretrievePixVal.py - Pixel-level unique value
retrievalUsing this code, we partition the original images into 224×224 non-overlapping patches to balance computational efficiency and spatial context. After removing patches with excessive missing data, cloud coverage, or registration artifacts, we obtained approximately 29,400 valid patch pairs, split 90/10 for training and testing using a spatial splitting strategy.
figs/ - Documentation FiguresContains visualization assets and framework diagrams.
guided_diffusion/ - Core Diffusion Modulegaussian_diffusion.py - Gaussian diffusion model
implementationlosses.py - Custom loss functionstrain_test.py - Training and evaluation routinesunet.py - U-Net backbone for diffusion parameter
estimationmappingknow/ - Embedding Space Learningswinunettrain.py - Swin-UNet training for shared
embedding spacemappingtrain.py - Supervised learning for shared
embedding spacesmaintrain_s2on.py - Main training script for diffusion
modelmaintrain_s2on.sh - Shell script wrapper for Linux
trainingmainpredict_s2on.py - Inference script for image
generationmainpredict_s2on.sh - Shell script wrapper for Linux
predictionRun the following command to train the multimodal diffusion model:
python maintrain_s2on.py \
--gpu 1 \
--data_path /dataset \
--condition_way SARMAPTXT \
--num_epochs 10000 \
--output_root /tmp \
--save_interval 1000 \
--log_interval 1000 \
--batch_size 40 \
--map_swinunet /model_statedict_best.tor
Parameters:
--gpu - GPU device ID--data_path - Path to training dataset--condition_way - Conditioning strategy:
SARMAPTXT: SAR + shared embedding space + text prompts
(full model)SARMAP: SAR + shared embedding spaceSARTXT: SAR + text promptsSAR: SAR only (baseline)--num_epochs - Total training epochs--output_root - Directory for model checkpoints and
logs--save_interval - Checkpoint saving frequency--log_interval - Logging frequency--batch_size - Training batch size--map_swinunet - Path to pretrained Swin-UNet model for
embedding spaceFor more parameters, please refer to the code file.
Pretrained diffusion Unet models are also available at https://github.com/lspatial/trained_mdiffusion/. On the WHU-OPT-SAR dataset, our model achieves an SSIM of 0.57 and a PSNR of 20.92 dB.
Run the following command to generate optical images from SAR inputs:
python mainpredict_s2on.py \
--gpu 1 \
--data_path "/predict_dataset" \
--condition_way SARMAPTXT \
--output_root "/tmp" \
--map_swinunet /model_statedict_best.tor
Note: Ensure the pretrained Swin-UNet model
(model_statedict_best.tor) is available before running
inference.
requirements.txt (if
available)If you use this code in your research, please cite our paper:
@article{li2025enhanced,
title={Enhanced SAR-to-Optical Image Translation: A Multimodal Knowledge-Guided Diffusion Model Approach},
author={Li, Lianfa},
journal={The Visual Computer},
note={under review},
year={2025}
}
Lianfa Li
Email: lspatial@gmail.com
For questions, bug reports, or collaboration inquiries, please reach out via email.
\[MIT, Apache 2.0, GPL\]
Last updated: 2025